INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE 



From Random Graph to Small World by Wandering 



oc 

o 

C 

(N 



Bruno Gaume — Fabien Mathieu 



q 

c/i 
U 



(N 
> 



N° 6489 

Avril 2008 



C 

OC 

c 



X 



.Theme COM 




ROCOUENCOURT 



From Random Graph to Small World by Wandering 

Bruno Gaum^, Fabien MathieiH 

Theme COM — Systemes communicants 
Projet GANG 

Rapport de recherche n° 6489 — Avril 2008 — [11] pages 



Abstract: Numerous studies show that most known real-world complex networks share similar properties in 
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Petit-mondisation par marches aleatoires 



Resume : De nombreuses etudes montrent un fait remarquable qui est que la plupart des reseaux dits de terrain 
possedent dcs proprictcs idcntiqucs bien particulieres et font partic dc la classe des graphes petit-monde. Un 
autre fait tout aussi remarquable est que cette classe des petits mondes est tres petite au regard de I'ensemble 
des graphes possibles. Dans cet article, nous proposons une methode de production de graphes petit-monde au 
moyen de marches aleatoires. 

Mots-cles : Graphes aleatoires, petits mondes, marches aleatoires 



From Random Graph to Small World by Wandering 



3 



1 Introduction 

In 1998, Watts and Strogatz showed that many real graphs, coming from different fields, share similar proper- 
ties [28]. This has been confirmed by many studies since this seminal work [l[Ml9llTllT3 i rr7l[6l[25l[5l [23irni l3]. 
The concerned fields include, but are not limited to: epidemiology (contact graphs, ...), economy (exchange 
graphs, . . . ), sociology (knowledge graphs,. . . ), linguistic (lexical networks, . . . ), psychology (semantic associ- 
ation graphs,. . .), biology (neural networks, proteinic interactions graphs), IT (Internet, Web). . .We call such 
graphs real- world complex networks, or small- world networks. 

The common properties of real-world complex networks are a low diameter, a globally sparse but locally 
heavy edge density, and a heavy-tailed degree distribution. The combination of these property is very unlikely 
in random graphs, explaining the interest that those networks have arisen in different scientific communities. 

In this article, we propose a method to generate a graph with small-world properties from random graph. 
This method, which is based on random walks, may be a first step in order to understand why graphs from 
various origins share a common structure. 

In Section m we briefly state the properties used to decide wheter a given graph is small world or not. In 
Section [31 we survey the different existing methods to generate complex networks. In Section lU we analyse the 
dynamics or random walks in a graph, and in Section [5] we propose a new method to construct small worlds by 
wandering on random graphs. Section [6] concludes. 

2 Small Worlds Structure 

let G — {V, E) be a reflexive, symmetric graph with n — \V\ nodes and m — \E\ edges. G is called small world 
if the following properties are verified: 

Edge sparsity Small world graphs are sparse in edges, and the average degree stay low: m = 0{n) or m = 
0(nlog(n)) 

Short paths The average path length (denoted £) is close to the average path length f rand in the main connected 
component of G{n,m) — Gin, ) Erdos-Renyi graphs. According to [12J, for d :— ™~" > (1 -I- 

e)log(n), g{n, ^'^-^^ ) is almost surely connected, and 4and ~ Ts|(ay- = 0(log(n))). 

High clustering The clustering coefficient, C, that expresses the probability that two disctinct nodes adjacent 
to a given third node are adjacent, is an order of magnitude higher than for Erdos-Renyi graphs: C » 
Crand — P — ■ This indicates that the graph is locally dense, although it is globally sparse. 

Heavy-tailed degree distribution 

Example: DicoSyn.Verb^ is a reflexive symmetric graph with 9043 nodes and 110939 edges. For sake of 
convenience, we only consider the main connected component Gc of DicoSyn, which admits 8835 nodes and 
110533 edges. With an average degree of 12.5, Gc is sparse. Other parameters of Gc are £ « 4.17 (to compare 
with ^rand = 3.71) and G ~ 0.39 (to compare with C^and — P — 0.0013). The degree distribution is heavy-tailed, 
as shown by Figure [T] (a least-square method gives a slope of —2.01 with a confidence 0.96). Therefore Gc 
verifies the four properties of a small world. 

Note, that the degree distribution for random Erdos-Renyi graphs is far from being heavy-tailed. It is in 
fact a kind of Poisson distribution : the probability that a node of a Q[n,p) graph has degree k is p{k) = 
p'°(l — p)"~^~'^ ("^^) . Figure O where the degree distribution of a Erdos-Renyi graph with same number of 
nodes and average degree than Gc is plotted. This illustrates how a small world compares to a ^ graph with 
same number of nodes and expected degree: 

• Same sparsity (by construction), 

• Similar average path length, 

• Higher clustering, 

^DicoSyn is a french synonyms dictionnary built from seven canonical french dictionnaries (Bailly, Benac, Du Chaz- 
aud, Guizot, Lafaye, Larousse et Robert). The ATILF (http://www.atiif .fr/) extracted the synonyms, and the CRISCO 
(http://elsapl.unicaen.fr/) consolidated the results. DicoSyn.Verbe is the subgraph induced by the verbs of Dicosyn: an 
edge exists between two verbs a and b iff DicoSyn tells a and b are synonyms. Therefore DicoSyn.verbe is a symmetric graph, made 
reflexive for convenience. A visual representation based on random walks |15j can be consulted on http : //Prox . irit .f r ^ 
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Figure 1: Degree distribution of Gc 



Heavy-tailed distribution (instead of Poisson distribution) 
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Figure 2: Degree distribution of a typical Q{n,p) graph 

In [3], Albert and Barabasi have made a survey on existing complex networks studies, including [4l [20l [9l fT| 
[HinilellSlllllllSlIIUlH]- Some of their findings are presented in Table [1] along G^'s properties. 



Name 


n 


< A: > 


i 


C 


7 


r2 


DicoSyn.Verbes 


8835 


11.51 


4.17 


0.39 


2.01 


0.96 


Internet routers 


150000 


2.66 


11 




2.4 




Movie actors 


212250 


28.78 


4.54 


0.79 


2.3 




Co-authorship, SPIRES 


56627 


173 


4.0 


0.726 


1.2 




Co-authorship, math. 


70975 


3.9 


9.5 


0.59 


2.5 




Co-authorship, neuro. 


209293 


11.5 


6 


0.76 


2.1 




Ythan estuary food web 


134 


8.7 


2.43 


0.22 


1.05 




Silwood Park food web 


154 


4.75 


3.40 


0.15 


1.13 




Words, synonyms 


22311 


13.48 


4.5 


0.7 


2.8 





Table 1: Main properties of some complex networks 
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3 Generating Small Worlds: State of Art 




Small- world networks have been studied intensely since they were first described in Watts and Strogatz [28] , 

Researchs have been done in order to be able to generate random datasets with well-known characteristics 
shared by social networks. Most papers focus on either the clustering and diameter, or on the power-law. 

3.1 Clustering and diameter property 

Watts and Strogatz [28], and Kleinberg [19] have studied families of random graphs that share the clustering 
and diameter properties of small worlds. Watts and Strogatz model consist in altering a regular ring lattice by 
rewiring randomly some links. In Kleinberg's model, a d-dimensional grid is extended by adding extra-links of 
which the range follows a d-harmonic distribution. 

Note, that both models fail to capture the heavy-tail property met in real complex networks (they are almost 
regular) . 

3.2 Heavy-tail property 

There is a lot of research devoted on the production of random graphs that follow a given degree distribution [U 
[2T| \22\ [26] . Such generic models easily produce heavy-tailed random graphs if we give them a power law 
distribution. 

On the field of specific heavy-tailed models, there is Albert and Barabasi preferential attachment's model |3l 
[6], in which Hnks are added one by one, and where the probability that an existing node receives a new Hnk 
is proportional to its degree. A more flexible version of the preferential attachment's model is the fltness 
model [Hill! where a pre-determined fltness value is used in the process of link creation. 

Lastly, Aiello et al. proposed a model called a, (3 graphs [2], that encompasses the class of power law graphs. 

3.3 Others models 

Other models of graph generation are Guillaume and Latapy's All Shortest Paths [18], where one construct a 
graph by extracting the shortest paths of a random graph, and the Dorogovtsev-Mendes model [H]. Note, that 
the latter captures all desired properties, but is not realistic. 

4 Confluence &; Random Walk in Networks 
4.1 Random Walk in Networks 

Just like Section [21 G — iV,E) is a reflexive, symmetric graph with n = \V\ nodes and m = \E\ edges. We 
assume that a particle wanders randomly on the graph: 

• At any time t gN the particle is on a node u{t) G V; 

• At time t + 1, the particle reaches a uniformly randomly selected neighbor of u{t). 

This process is an homogeneous Markov chain for on V. A classical way to represent this chain is a n x n 
stochastic matrix [G]: 



Because G is reflexive, no node has null degree, so the underlying Markov chain [G] is well deflned. For 
any initial probability distribution Pq on V and any given integer t, Po[G']* is the result of the random walk of 
length t starting from Pq whose transitions are deflned by [G]. More precisely, for any u, v in V, the probability 
Pt of being in v after a random walk of length t starting from u is equal to {Su[GY)v = ([G]*)„,t,, where Su is 
the certitude of being in u. One can demonstrate, by the dint of the Perron- Frobenius theorem [23], that if 
G = (V, E) is a connected, reflexive and symmetric graph, then: 




(1) 



\fu,v £ V, lim {Su[G]X = lim ([G]*),, 



deg(u) 



(2) 



U.V ~ 
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In other words, given than t is large enough, the probabiHty of being on node v at time t is proportional to the 
degree of V, and no longer depends on the departure node u. 

4.2 Confluence in Networks 

Equation ^ tells that the only information retained after an infinite random walk is the degree of the nodes. 
However, some information can be extracted from transitional states. For instance, assume the existence of 
three nodes u, vi and V2 such that 

• u, vi and V2 belong to the same connected component, 

• vi is close from u, in the sense that many short paths exist between u and vi, 



• V2 is distant from u, 

• vi and V2 have the same degree. 

From (121), we know that the sequences {{[GY)u.vi)i<t and {{[GY)u^y^)i<t share the same limit, that is 

deg(wi)/ Exev deg(^) = deg(w2)/ T,xev deg(a;). 

However these two sequences are not identical. Starting from u, the dynamic of the particle's trajectory on 
its random walk is completely determined by the graph's topological structure, and after a limited amount of 
steps t, one should expect a greater value for {{[GY)u,vi) than for (([G']*)u,t,2) because vi is closer from u than 

V2- 

This can be verified on the graph of french verbs Gc, with: 

• u — deshabiller ("to undress"), 
t vi = effeuiller ("to thin out"), 

• V2 = rugir ("to roar"). 

Intuitively, effeuiller should be closer (in Gc) to deshabiller than rugir, because this is the case semantically. 
Also effeuiller and rugir have the same degree (11). 



The values of {{[GY)u,vi) and (([G]')„,^2) with respect to t are shown in Figure 3(a) , along with the common 
asymptotic value ^ "dcg(x) ■ 
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(b) Random graph 



Figure 3: (([G]*)„,.„j) and {{[GY)u,vi) for Gc and a random graph 

One can observe that, after a few steps, {{[GY)u,vi) is above the asymptotic value. We claim that this is 
typical of nodes that are close to each other, and call this phenomenum strong confl,uence. On the other hand, 
ii[GY)u,v2) is always below the asymptotic value {weak confl,uence). 
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One could think that the existence of strong and weak confluences is typical of graphs with high clustering, 
because the notion of closeness sounds like belonging to a same community. However, strong and weak conflu- 
ences also occur in graphs with low clustering coefficients, such as Erdos-Renyi random graphs. For example. 
Figure [3(b)] shows {{[QY)u.vi) and {{[GY)u,v2) for three nodes u, vi and V2 carefully selected in Q an Erdos-Renyi 
graph with same number of nodes and average degree than Gc- 

Figure [3(b) I is very similar to Figure 3(a) This points out that the concept of confluence exists in random 



graphs like it does in small worlds. In the following Section, we will use this to turn random graphs into 
small- worlds. 



5 Prom Random Graph to Small World by Wandering 

Now we want to use the concept of confluence to provide a way to construct small-world like graphs. In order 
to do that we introduce the mutual confluence conf between two nodes of a graph G at a time t: 

confG(u, V, t) = max([G]^ „, [G]* „) (3) 

For not too large values of i, a strong mutual confluence between two nodes may indicate that those nodes 
are close. We claim that a good way to obtain a small world from a random graph is to set edges between the 
pairs of nodes with the highest confluence. 

5.1 Extracting the confluence graph 

Given an input graph Gm = (y, i^m), symmetric and reflexive, with n nodes and mi„ edges, a time parameter 
t and a target number of edges m, one can extract a strong confluence graph G = scg(Gi„,t, m) deflned by: 

• G a symmetric, reflexive graph with the same nodes than Gm and m edges, 

• Vr 7^ s, u ^ w g if (r, s) G E' and [u, v) ^ E, then confc^^ (r, s, t) > confc.^ (u, v, t). 



Algorithm 1: seg (strong confluence graph), extract highest confluences 



Input: An undirected graph Gm = {V,Ein), with n nodes and m.^ 

A walk length t e N* 

A target number of edges m £ [n, v?] 

Output: A graph G = {y,E), with n nodes and m edges 

begin 



edges 



1 to n do 

-E\j{{i,i)} 



E i — 
for i ^ 

I ^ 
end 

M < — n 

while M < m do 

— argmax(„^„ 
E(j{{r,s)} 
EU{{s,r)} 
M + 2 



(a) 




(r, s) ^ 


(b) 




E < — 


(c) 




E < — 






M < — 



/* Make G reflexive */ 
/* Is there unset edges? */ 
/* Stay symmetric */ 



end 
end 



Algorithm [T] proposes a way to construct scg(G, t, m) 
values, line (a) 



Note, that because of possible confluences with same 
there is no guarantee that the strong confluence graph is 



is not deterministic. Furthermore 
unique, but the possible graphs can only differ by their (few) edges of lowest confluence. In practice, confluences 
are distinct most of the tim^ 
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Algorithm 2: makesw, Making a small world 
Input: A target number of nodes for the output graph n G N 
A target number of edges for the random graph rriin G N 
A walk length t €W 
A target number of edges m G N 

Output: A graph G = {V,E), with n nodes and m edges 
begin 

Gin < — a symmetric, reflexive, Erdos-Renyi Random Graph with n nodes and rrun edges 
G < — scg{Gtn,t,m) 

G < — largest connected component of G 
end 



5.2 Making Small- Worlds 

We propose to construct graphs with a small-world structure by extracting the confluences of Erdos-Renyi 
graphs, as described in Algorithm [21 Note, that the confluence extraction may produce disconnected graphs. 
Therefore we have to select the main connected component if we want to study properties like diameter. However, 
our experiments show that the size of the main connected component is always more than 80%, so this is not 
such a big issue. 

5.3 Validation 

In order to obtain good small- worlds, the values n, niin, m and t must be carefuhy selected. In the following, 
we set n = 1000, mi„ = 4000, and m = 10000, and we focus on the importance of the parameter t. 

Like stated in Section [21 there is no strict definition of a small-world, but typical values for diameter, 
clustering and degree distribution. We arbitrary propose to say that G = makesw(n, m^n, i, to) is small- world 
shaped if it verifies: 

• TO < 10rilog(r7,) (verified for n = 1000, to ~ 10000), 

• its clustering coefficient Cg is greater than , 

• its diameter is lower than 31og(n), 

• a least square fitting on the degree log-log distribution gives a negative slope of absolute value A greater 
than 1, with a correlation coefficient grater than 0.8. 

Remark The power law estimation we give is not very accurate (see for instance [27]) • However, giving a 
correct estimation of the odds that a given discrete distribution is heavy-tailed is a difficult issue ([HITO]), and 
refining the power-law estimation is beyond the scope of this paper. 

It is is easy to verify that with those requirements, a random Erdos-Renyi graph with 1000 nodes and 10000 
edges is not a small world with high probability (for instance because of the clustering coefficient). On the other 
hand, G = makesw {n, mm, t,m) verifies small-world properties for some values oft, as shown in Figur^ 

• The upper curve shows the diameter L (remember that we only consider the main connected component, 
therefore the diameter is always well defined). The diameter is always low and consistent with a small- 
world structure. 

• The next curves indicates the clustering coefficient C. For 2 < t < 40, C is very high. It drops after 40, 
as the confiuences converge to the nodes' degrees, meaning that most of the edges come from the highest 
degree nodes of the input graph. This leads to star-like structures, that explain the poor clustering 
coefficient. 

• The two next curves indicates that the degree distribution may be a power-law, with a relatively high 
confidence, for 28 < t < 50. 

• Lastly, the lower curve summarizes the values of t that verify the small-world requirements (mainly 
28<t< 40). 

^If uniqueness really matters, it suffices to use a total order on the pairs of V in order to break ties in line|(a)[ 
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Figure 4: Small- world properties of G — makesw(n, min,t, m) with respect to t. 



6 Conclusion 

We proposed in this article a method to turn random graphs into Small- World graphs by the dint of random 
walks. This simple and intuitive method allow to set a target number of nodes and edges. The resulting graphs 
possess all desired properties: low diameter, low edge density with a high local clustering, and a heavy-tailed 
degree distribution. This method is suitable for generating random small-world graphs, but it is only a first 
step for answering the question: why are most of real graphs small-worlds, despite the fact that the small-world 
structure is very unlikely among possible graphs ? 

In order to be ehgible for explaining small- world effects, a small- world generator should be based on local 
interactions. Therefore it should be decentralized, which is not the case of Algorithm [H However, there exists 
variations of Algorithm [2] that can be decentraHzed: for instance, if we introduce a confluence bound s, an 
algorithm where each node u decide to connect with any node it can find with a mutual confiuence greater than 
s has the same behavior that Algorithm [2] (but the number of edges m is then indirectly set by the parameter 
s). Understanding the relationship between m and s is part of our future work. 

Also note, that the random walks we used in this first algorithm may be too long: for instance. Figure H] 
shows that a length between 28 and 40 is needed to achieve small-world properties for a 1000 nodes graph, 
which is much larger than the expected diameter of a small-world graphs of that size. We are currently working 
on a way to shorten the random walks by embedding a preferential attachment scheme [3J into our algorithm. 
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